97 research outputs found

    TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System

    Get PDF
    Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier

    A utility-based model to define the optimal data quality level in IT service offerings

    Get PDF
    In the information age, enterprises base or enrich their core business activities with the provision of informative services. For this reason, organizations are becoming increasingly aware of data quality issues, which concern the evaluation of the ability of a data collection to meet users’ needs. Data quality is a multidimensional and subjective issue, since it is defined by a variety of criteria, whose definition and evaluation is strictly dependent on the context and users involved. Thus, when considering data quality, the users’ perspective should always be considered fundamental. Authors in data quality literature agree that providers should adapt, and consequently improve, their service offerings in order to completely satisfy users’ demands. However, we argue that, in service provisioning, providers are subject to restrictions stemming, for instance, from costs and benefits assessments. Therefore, we identify the need for a conciliation of providers’ and users’ quality targets in defining the optimal data quality level of an informative service. The definition of such equilibrium is a complex issue since each type of user accessing the service may define different utilities regarding the provided information. Considering this scenario, the paper presents a utility-based model of the providers’ and customers’ interests developed on the basis of multi-class offerings. The model is exploited to analyze the optimal service offerings that allow the efficient allocation of quality improvements activities for the provider

    Online anomaly detection using statistical leverage for streaming business process events

    Full text link
    While several techniques for detecting trace-level anomalies in event logs in offline settings have appeared recently in the literature, such techniques are currently lacking for online settings. Event log anomaly detection in online settings can be crucial for discovering anomalies in process execution as soon as they occur and, consequently, allowing to promptly take early corrective actions. This paper describes a novel approach to event log anomaly detection on event streams that uses statistical leverage. Leverage has been used extensively in statistics to develop measures to identify outliers and it has been adapted in this paper to the specific scenario of event stream data. The proposed approach has been evaluated on both artificial and real event streams.Comment: 12 pages, 4 figures, conference (Proceedings of the 1st International Workshop on Streaming Analytics for Process Mining (SA4PM 2020) in conjunction with International Conference on Process Mining, Accepted for publication (Sep 2020)

    Stability Metrics for Enhancing the Evaluation of Outcome-Based Business Process Predictive Monitoring

    Get PDF
    Outcome-based predictive process monitoring deals with predicting the outcomes of running cases in a business process using feature vectors extracted from completed traces in an event log. Traditionally, in outcome-based predictive monitoring, a different model is developed using a bucket containing different types of feature vectors. This allows us to extend the traditional evaluation of the quality of process outcome predictions models beyond simply measuring the overall performance, developing a quality assessment framework based on three metrics: one considering the overall performance on all feature vectors, one considering the different levels of performance achieved on feature vectors belonging to individual buckets, i.e., the stability of the performance across buckets, and one considering the stability of the individual predictions obtained, accounting for how close the predicted probabilities are to the cutoff thresholds used to determine the predicted labels. The proposed metrics allow to evaluate, given a set of alternative designs, i.e., combinations of classifier and bucketing method, the quality of the predictions of each alternative. For this evaluation, we suggest using either the concept of Pareto-optimality or a scenario-based scoring method. We discuss an evaluation of the proposed framework conducted with real-life event logs

    Measuring the Stability of Process Outcome Predictions in Online Settings

    Full text link
    Predictive Process Monitoring aims to forecast the future progress of process instances using historical event data. As predictive process monitoring is increasingly applied in online settings to enable timely interventions, evaluating the performance of the underlying models becomes crucial for ensuring their consistency and reliability over time. This is especially important in high risk business scenarios where incorrect predictions may have severe consequences. However, predictive models are currently usually evaluated using a single, aggregated value or a time-series visualization, which makes it challenging to assess their performance and, specifically, their stability over time. This paper proposes an evaluation framework for assessing the stability of models for online predictive process monitoring. The framework introduces four performance meta-measures: the frequency of significant performance drops, the magnitude of such drops, the recovery rate, and the volatility of performance. To validate this framework, we applied it to two artificial and two real-world event logs. The results demonstrate that these meta-measures facilitate the comparison and selection of predictive models for different risk-taking scenarios. Such insights are of particular value to enhance decision-making in dynamic business environments.Comment: 8 pages, 3 figures, Proceedings of the 5th International Conference on Process Mining (ICPM 2023

    Leveraging a Heterogeneous Ensemble Learning for Outcome-Based Predictive Monitoring Using Business Process Event Logs

    Get PDF
    Outcome-based predictive process monitoring concerns predicting the outcome of a running process case using historical events stored as so-called process event logs. This prediction problem has been approached using different learning models in the literature. Ensemble learners have been shown to be particularly effective in outcome-based business process predictive monitoring, even when compared with learners exploiting complex deep learning architectures. However, the ensemble learners that have been used in the literature rely on weak base learners, such as decision trees. In this article, an advanced stacking ensemble technique for outcome-based predictive monitoring is introduced. The proposed stacking ensemble employs strong learners as base classifiers, i.e., other ensembles. More specifically, we consider stacking of random forests, extreme gradient boosting machines, and gradient boosting machines to train a process outcome prediction model. We evaluate the proposed approach using publicly available event logs. The results show that the proposed model is a promising approach for the outcome-based prediction task. We extensively compare the performance differences among the proposed methods and the base strong learners, using also statistical tests to prove the generalizability of the results obtained

    Exploring the Suitability of Rule-Based Classification to Provide Interpretability in Outcome-Based Process Predictive Monitoring

    Get PDF
    The development of models for process outcome prediction using event logs has evolved in the literature with a clear focus on performance improvement. In this paper, we take a different perspective, focusing on obtaining interpretable predictive models for outcome prediction. We propose to use association rule-based classification, which results in inherently interpretable classification models. Although association rule mining has been used with event logs for process model approximation and anomaly detection in the past, its application to an outcome-based predictive model is novel. Moreover, we propose two ways of visualising the rules obtained to increase the interpretability of the model. First, the rules composing a model can be visualised globally. Second, given a running case on which a prediction is made, the rules influencing the prediction for that particular case can be visualised locally. The experimental results on real world event logs show that in most cases the performance of the rule-based classifier (RIPPER) is close to the one of traditional machine learning approaches. We also show the application of the global and local visualisation methods to real world event logs
    • 

    corecore